Improved Speech Recognition using Adaptive Audio-visual Fusion via a Stochastic Secondary Classifier
نویسندگان
چکیده
The adaptive fusion of video and audio is one of the fundamental pursuits of audio visual speech recognition (AVSR). In this paper the use of a high dimensional secondary classijier on the word likelihood scores from both the audio and video modalities is investigated fo r the purposes of adaptive fusion. Results are presented that lie above or equal to the boundary of catastrophic fusion across a number of audio noise
منابع مشابه
Improving visual noise insensitivity in small vocabulary audio visual speech recognition applications
Visual noise insensitivity is important to audio visual speech recognition (AVSR). Visual noise can take on a number of forms such as varying frame rate, occlusion, lighting or speaker variabilities. In this paper the use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Prelimin...
متن کاملAudio-Visual Speaker Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities
An audio-visual speaker identification system is described, where the audio and visual speech modalities are fused by an automatic unsupervised process that adapts to local classifier performance, by taking into account the output score based reliability estimates of both modalities. Previously reported methods do not consider that both the audio and the visual modalities can be degraded. The v...
متن کاملKeyword Spotting Based On Decision Fusion
Automatic speech recognition (ASR) technology is available now-a-days in all handsets where keyword spotting plays a vital role. Keyword spotting performance significantly degrades when applied to real-world environment due to background noise. As visual features are not affected much by noise this provides better solution. In this paper, audio-visual integration is proposed which combines audi...
متن کاملTwo-Level Bimodal Association for Audio-Visual Speech Recognition
This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second...
متن کاملAn investigation of HMM classifier combination strategies for improved audio-visual speech recognition
The combining of independent audio and visual HMM classifiers (late integration) has been shown to out perform the combination of audio and visual features in a single HMM classifier (early integration) when either or both modalities are presented with distortion for the task of speech recognition. Theoretical foundations for the optimal combination of these audio and video classifiers are stil...
متن کامل